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Positional cloning Involves first finding linkage between an inherited phenotype (such as a disease) and a DNA 
marker, followed by the use of a variety of physical and genetic mapping techniques to move from linkage to 
mutation. If there is a founder effect within a population, crossovers are often rare between the mutation causing 
the phenotype and closely situated markers and increasing disequilibrium may be observed as the site of the 
mutation is approached. Standard coefficients of disequilibrium may, however, be insensitive to the relative 
position of close markers and the mutation, because they depend upon allele frequencies in the normal population 
compared to those of the founder chromosome. Using cystic fibrosis in European populations as a model system, 
alternative methods for determining the position of a mutation are discussed. These include haplotype parsimony 
and three-way interval likelihood analysis. Both methods predict the location of the major CF mutation accurately 
from a real set of more than 600 European CF chromosomes. 



INTRODUCTION 

Haplotype analysis of normal and disease-associated 
chromosomes provides information for mapping a mutation 
within tightly linked DNA markers. Linkage disequilibrium has 
been used to determine how close a particular polymorphic 
marker is to a disease locus, but it is highly sensitive to the 
frequency of the polymorphic alleles, and whether the common 
allele on normal chromosomes is the same as ihe common allele 
on the chromosome carrying the mutation (1-3). An attempt 
is made to explore the use of different methods of haplotype 
analysis to predict the position of a disease causing mutation 
within a set of closely linked markers. 

The cystic fibrosis (CF) locus was mapped by linkage analysis 
to chromosome 7q3 1 (4 - 7), and h was shown that there is locus 
homogeneity for this disorder. Alleles of some closely linked 
markers show significant linkage disequilibrium with the CF 
mutation in every population tested, particularly in northern 
Europe. This suggests that the majority of CF chromosomes share 
a common origin and have the same mutation (8- 16). A large 



haplotype data set was collected for markers close to and within 
the CF locus on CF and non-CF chromosomes in ten European 
populations. 

In the identification of the CF gene, there were no associated 
chromosomal anomalies to aid movement from linkage to locus. 
Therefore, positional cloning was used to identify and characterise 
the CF transmembrane conductance regulator (CFTR) gene (1, 
17). The common disease causing mutation was identified as a 
3bp deletion in exon 10, leading to the loss of a phenylalanine 
residue in the first nucleotide binding fold. This common 
mutation, AF508, is present on between 70% and 80% of CF 
chromosomes of north European origin, but only 35% to 50% 
of southern European origin, indicating that there are many other 
CF causing mutations (18). It is apparent that there is greater 
mutation homogeneity in northern Europe. In this study, over 
600 CF-associated and over 600 non-CF-associated chromosome 
haplotypes were determined in 10 European populations, using 
ft RFLP markers closely linked to the CF mutation. Two of the 
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markers, pG2 and H80, are situated within the CFTR gcnc ? 3' 
to the common AF508 mutation (19). 

There are tour major prerequisites for the methods of analyses 
described here: 1 . The existence of a single predominant disease- 
causing mutation. 2. The location of this mutation within a 
specific haplotype which is referred to as the 'ancestral' 
hapJorype, which is the commonest disease-associated haplotype 
in all populations studied. 3. Knowledge of the physical order 
of the markers. 4. The ability to analyze different disease- 
associated haplotypes which have arisen from the 'ancestral* type 
through historical recombination. 

The analyses identified a region of 225kb, between the markers 
pMP6d-9 and pG2 as the most likely position for the common 
CF causing mutation. This is, in fact, the position of the AF508 
mutation. The occurrence of the AF508 mutation within the 
haplotypes of the present study were not, and are still not known. 

The complete compilation of haplotypes is made available in 
order that others may perform alternative analyses. The analysis 
of muhi-site haplotypes may well offer a paradigm for future 
studies with other genetically homogeneous disorders. 

RESULTS 

Allelic association at each of the Joci 

Allele frequencies for CF and non-CF chrom os omes, 
stan da r dised jiiduge f di s e^ulU unum (delta) , odds rari d-a]aA_cjM 
squared homogeneity and heterpgsnejt^calc^^ 
shown in Table.; 1 tor "e^chj^f the.six^RFLPs: pXV-2c/TaqI, 
pKM.I9/PstI, pMP6d-9/Mspi, pG2/XbaI r H80flPstI and 
pJ3. i !/MspI. Data tor MET have been reported previously and 
are not included in the table. The physical order of the DNA 
markers and the distances between them are shown in Figure I , 
as are the calculated odds ratios for each of the loci and the 
associated 95% confidence limits. For arithmetic convenience, 
the alleles were inverted for pKM. 19, pMP6d-9, pG2 and HSO. 
A marked increase in allelic association is seen for pKM.19 and 
pMPod-9, which are not significantly different from one another, 



with a decrease in association for the other markers in both the 
pJ3.11 and MET directions. The allele frequencies for the 
different populations are homogeneous with respect to CF for 
five of the loci. There is, however, significant (p<0.0ll) 
heterogeneity for the H80/PstT alleles. 

Three-way analysis of loci 

Three-way analysis to determine the most likely position of the 
major CF mutation was done for each marker-marker 
combination using the following markers: pXV-2c % pKM.19. 
pMP6d-9. pG2 and H80. CF and non-CF haplotypes of the paired 
loci are shown in Table 2 for all the populations combined and 
then for the northern (Finnish, Danish, both English. Welsh and 
French) and southern (Spanish, both Italian and Bulgarian) data 
subsets. These data were used in the calculations which aim to 
exclude or include the CF mutation from each of the marker- 
marker intervals. Several assumptions are made, as follows: the 
predominant CF mutation has occurred only once, on the 
'ancestral" CF haplotype: other CF haplotypes have arisen from 
the 'ancestral' haplotype through recombination; sequential 
recombination events on the same chromosome are negligible: 
the frequency of alleles on normal chromosomes has not changed 
significantly since the CF mutation occurred. 

Each marker- marker pair was examined sequentially as is 
exemplified in the methods section. All the chi squared values 
which show significant exclusion (p<0.001) of ihe CF mutation 
from either within or outside marker — marker intervals are given 
in Table 3. The results are shown schematically in Figure 2 where 
the bars of exclusion for CF are proportional in thickness to the 
size of the chi squared. The chi squared values cannot be added 
for the intervals because some nf the same data have been used 
to calculate overlapping regions of exclusion. This method of 
analysis may have more power if more markers were used 
together. 

The northern European populations show greater homogeneity 
than the southern European populations when analyzed separately. 
Three-way interval probability analysis for the northern group 



Tabic 1. Distribution of alleles at five polymorphic markers closely linked to ihe CF mutation in ten European populations 
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shows that the CF mutation is most probably between pMP6d-9 
and pG2. Each of the other intervals is excluded at p<0.001. 
The same region shows the smallest exclusion value for the 
southern European group, although in this data subset every 
interval is technically excluded. 

Haplotype analysis 

Two sets of haplotype data were compiled by hand. The first 
consists of haploiypes for CF and non-CF chromosomes with 
the four marker combination pXV-2c. pKM.19, pMP6d-9 and 
pG2 {Table 4). In total. 547 CF haplotypes were unambiguously 
assigned. Complete information was obtained for an additional 
102 chromosomes but phase could not be deduced. As expected, 
a higher proportion (0.23) of southern European CF 
chromosomes were not determinable when compared to the 
northern group (0. 10). The second consists of haplotypes 
including five markers, the above four and H80, presented in 
Table 5. In this case a total of 278 CF haplotypes were 
unambiguously assigned and in a further 70 phase could not be 
determined. The main reason for the decrease of numbers in the 
second set is that the Manchester, Welsh and Urbino families 
were not typed for H80, and in many of the other populations 
the H80 data were incomplete. 

The CF haplotypes were analyzed for compatibility with the 
proposed major ancestral CF haplotype 12 2 2 2. They were 
compared with regard to the alleles they have in commorv with 
the ancestral type. The shaded boxes around the haplotypes 
highlight the ancestral type or part thereof (Figure 3). The vertical 
lines show the interval which is most likely to include the CF 
mutation. There was no apparent difference between the northern 
and southern groups. When the four locus haplotypes are 
analyzed, 54 CF chromosomes are incompatible with the CF 
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mutation being between pKM. 19 and pMP6d-9 and only two are 
incompatible with the interval pMP6d-9 to pG2. Similarly, in 
the five locus analysis 27 and 1 CF chromosomes are 
incompatible with each or these intervals, indicating that the CF 
mutation is most likely to be between pMP6d-9 and pG2. 
Incompatibility refers to a situation where neither of the alleles 
at markers flanking the proposed position of the CF mutation 
is the ancestral type. 

DISCUSSION 

A large body of data on polymorphic markers physically and 
genetically close to CF, have been collected and analyzed with 
the aim of predicting the position of the CF mutation among them. 
The order of the markers has been well established by physical 
mapping using pulse field gel electrophoresis (7> 10. 19,21.22). 
Before the cloning of the CF gene, in the absence of chromosomal 
translocations and deletions, it was difficult to order close markers 
with respect to the major mutation. When linked markers are 
at a genetically determinable distance from the mutation, it is 
possible to use information from recombinant families, but when 
the distance becomes so small that recombinant families are very 
rare, this is no longer possible. In such cases alternative methods 
are needed. An attempt has been made to analyze liaptotypes 
constructed from closely linked markers in order to determine 
the position of the major CF mutation. 

Three different methods of analysis were used. The first is the 
widely used method of allelic association (20). Figure 1 shows 
that alleles at pKM.19 and pMP6d-9 clearly have the highest 
association with CF, indicating that they are likely to be the closest 
markers to the mutation. They are not significantly different from 
one another and theoretically the CF mutation could lie between, 
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Figure I. Physical dzstimce and relative petitions of markers closely linked and flaiikim: the major CF mutation. _*F50H, arc shown. Scale is indicated in the box. 
top richi. The vertical axis show?; the odds ratio and 95 ft confidence intervals as a measure of allelic association for each of the polymorphic markers wid) CF. 
The value for MET was calculated from the data for the Toql'mctH RFLP us reported in reference 1 1. 
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Table 2. Painvise analysis of all the marker/marker combinations. Haplmypc 
numbers and standardise] linkage disequilibrium values arc given lor ihc combined 
sample (TOTAL) then separately for the north and south European populations 
on CF and non-CF chromosomes 
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CF = CF as>otimed chromosomes 
K = non-CF chromosomes. 

or to either side but close to, these markers. There is a marked 
decrease in association for markers towards both MET and 
pJ3.l I. The two loci, pG2 and H80, show significantly lower 
and non -overlapping odds ratio values when compared to pKM. 19 
and pMP6d-9, indicating that the CF mutation is unlikely to lie 
to the pJ3. U side of them. There is a tendency for delta values 
to he slightly lower in the southern European populations for the 
markers pXV-2c, pKM.19 and pMP6d-9, indicating more 
heteroseneitv than for the northern group, as reported previously 

(8, 9): 

The analysis of standardised linkage disequilibrium coefficients 
on CF and non-CF chromosomes showed thai the values were 
not consistently higher on CF chromosomes as opposed to non- 
CF chromosomes as might be expected. The pairwise 
comparisons where delta values were higher on non-CF 
chromosomes were: pKM.19, pG2; pKM.19, H80; pMP6d-9, 
pG2; pMP6d-9, H80; and pG2, H80. Each of these comparisons 
includes either pG2 or H80 and an explanation may be that a 
second CF mutation has occurred on chromosomes where cither 
pG2 and/or H80 is allele J. Seven percent of CF chromosomes 
in the north fall within this group, and 18% in the south. 

The other two analytical methods assume that the CF mutation 



Table 3. Pairwise analysis ol markers closely linked to the CF mutation. Chi 
square values given where the CF mutation is excluded from physical intervals 
at p <O.0OT 
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X — position of the recombination event. 

XV, pXV-2c; KM, pKM.19; D9. pMP6d-9; Q2. pG2. 

occurred only once on an 'ancestral* CF haplotype, and any other 
CF haplotypes have arisen from it through recombination with 
non-CF chromosomes, sequential recombination events arc rare 
and the RFLPs which are studied predate the CF mutation. 

In the north European subset 81% of the CF chromosomes 
are haplorype 12 2 2 (pXV-2c, pKM. 19, pMP6d-9. pG2). This 
haplorype is present on 63% of southern European CF 
chromosomes indicating that this subset is less homogeneous. 
It is therefore reasonable to assume, at least in the northern group, 
that there is a single predominant mutation which causes CF, 
and other mutations* at this locus make a small contribution to 
the total data. It is now known that the major CF mutation, 
AF508, is present on 70 to 80% of CF chromosomes in north 
European populations and 35 to 45% of south European 
populations (18). 

There is some evidence suggesting that groups of patients with 
severe symptoms have a more homogeneous haplorype 
composition whereas patients with milder symptoms have several 
different haplotypes (17). It would have been most suitable to 
analyze the data from a clinically homogeneous group of CF 
patients, but data on severity are not available for most of the 
cohorts presented here. If a second ancestral chromosome had 
made a large contribution to the haplotypes in our data set this 
may be apparent as a second common haplotype associated with 
CF. Haplotype data with the present markers are not available 
in patients assigned to PI or PS groups, but a previous publication 
has indicated that a high proportion of PS cases are allele I at 
a locus identical to pKM.19 (17). Recent studies on Italian, 
Belgium and German CF families showed that although there 
is a strong correlation between the AF508 mutation and the 
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Figure 2. Schematic representation of the three-way exclusion analysis of the toploty pe data. Chi squared values for exclusion at p<0,00I are shown a* blocks 
between the markers and their height is proportional lo their chi squared values. The data for the northern and southern European populations are analysed separately. 
The boxes arc shaded when exclusion data was generated from adjacent markers. XV - pXV-2c; KM « pKM.!9; D9 = pMP6d-9; G2 = pG2. 



Table 4. Distribution of CF- and non-CF-associated ICF and N) haplotypes <pXV-2c, pKM.19, p.MP6d-9. 
pG2> in European populations 
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'ancestral' haplotype, this haplotype has also been found in 
association with other CF mutations (23-25). In the Italian cohort 
of 212 CF families 27% of CF-associated chromosomes with the 
I 2 2 (pXV-2c pKM. 19 pMP6d-9) haplotype had a non-AF508 
mutation (23). 

When analyzing haplotype data, a certain number of 
chromosomes are lost to analysis because of incomplete data, 
and of haplotypes for which phase cannot be determined because 
both parents and a!! available children are heterozygous at one 
or more loci. The second factor is of greater concern because 
it is not random, and potentially lowers the proportion of CF 
haplotypes which are not the 'ancestral' type, because inability 
to determine phase is often due to the occurrence of the "ancestral* 
CF haplotype in conjunction with a 'rare' CF haplotype. This 



effect will be less marked in the set of non-CF chromosomes, 
as there is a more widespread distribution of haplotypes, and loss 
of information is more likely to be random. 

In the three-way analysis of the data the hypotheses were tested 
that the CF mutation is between or outside each marker -marker 
interval. Chi squared values were determined to examine the 
likelihood of the CF mutation for each interval, and the position 
was rejected when a significance level of p< 0.001 was obtained. 
The schematic representation of the data in Figure 2 shows that 
in the northern group, there was no exclusion in the interval pG2 
to pMP6d-9, indicating that this is the most likely position for 
the CF mutation. In the southern subset* this interval showed 
the lowest level of exclusion. The interval pKM. 19 to pMP6d-9 
was consistently excluded with high chi squared values. 
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Abbreviations as for Tabic 4. 



The analysis of the markers pKM.19 and pMP6d-9 is of 
interest, as it shows the greatest difference between the northern 
and southern data sets. The haplotypes L 2 and 2 1 (pKM. 19, 
pMP6d-9) are over-represented on CF chromosomes in the 
southern subset, which can be explained either by an early 
recombination event between these markers, or by further CF 
mutations which occurred in the south on those specific 
haplorypes. Neither of the marker orders, CF pKM.l 9 pMP6d-9 
nor pKM. 1 9 pMP6d-9 CF, are excluded in the north European 
data set. suggesting that the two markers are in strong 
disequilibrium. The order, CF pKM. 19 pMP6d-9> is excluded 
with a lower chi squared value than the correct locus order, 
pKM.19 pMP6d-9 CF. though it may not be significantly 
different. 

The haplotypes were also analyzed using the parsimony method 
to determine the position of the CF mutation as predicted by the 
least number of crossover events. The most common CF 
haplotype is I 2 2 2. 81% in the north and 63% in the south, 
line next most common haplotypes on CF chromosomes are 1 
I 12.21 12 and 2222. The first two could have arisen through 
recombination between pMP6d-9 and pG2 (A = -0.25 on non- 
CF chromosomes) and the third by recombination between 
pXV-2c and pKM.19 (A = -0.05 on non-CF chromosomes). 
Delta values for the other intervals, pKM.19 to pMP6d-9 and 
pG2 to H80. are both 0.64 on non-CF chromosomes and it is 
therefore less likely that recombination wtfl occur between them. 
It is thus reasonable to assume that the three most common 
haplotypes shown above are derived from the ancestral CF 
chromosome by single historical crossover events. The schematic 
representation of the data in Figure 3 shows that the interval 
pMP6d-9 to pG2 is the most likely position of the CF mutation. 
The interpretation of the data may well be influenced by the 
relative lack, of informal iviry of the pG2 and H80 RFLPs, since 
a proportion of recombinant events between them and adjacent 
markers will not be visible. This is the reason for not quantitating 
the observations from the CF haplotypes by computing the 
expected numbers of each recombination event. 
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Figure 3. CF-assocUMCd haplotypes fur ihe foui(A) and ftve<B) marker hapknypc 
data in the combined data set. The ancestral CF haplotype and portions thereof 
have been placed in shaded boxes. The dashed vertical line shows ihe region 
of maximum overlap which is the most likely position of the CF mutation, XV 
= pXV-2c; KM = pKM.19: D9 = pMP6d-9; G2 = pG2. 



We have attempted to present the data in full, in order that 
others may use different methods of analysis to position the CF 
mutation relative to the DNA markers. The allelic association 
calculations indicate that the major CF mutation should be close 
to the markers pKM.19 and pMP6d-9. Both other analytical 
methods argue for the location of the CF mutation between 
pMP6d-9 and pG2, an interval of 225kb. This is most 
convincingly shown for the more homogeneous north European 
cohort, using the three-way interval probability- analysis and (for 
the total data set) haplotype by inspection. The AF508 mutation 
is, in fact, situated within this 225kb region showing that these 
methods accurately predict its location. The methods presented 
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here should be applicable to other systems where closely Jinked 
polymorphic markers cannot be ordered with respect to a 
mutation. 

MATERIAL AND METHODS 

CF families 

CF families from tlx* following eleven centres in Europe were included in the 
present study: Finland ( 10 families j; Denmark (29): England. Manchester (46): 
England. Birmingham (33); Wales. Cardiff (|4>; France. Paris (58): France. 
Montpcllier (6): Spain (J 10): Italy. Verona (48): Italy. L'rbino (45 > and Bulgaria 
(25). In total. 433 nuclear families with at [cast one living affected child were 
studied to deduce the phase of the haplotypes. The families arc not kjwwn to 
bo related. Each of the groups has been analyzed separately with die exception 
of the French cohorts where they were pooled because of the small sample from 
Mnmpellier. Subsets of the data from several of the populations were reported 
previously (7. 0. 10. 26-30). 

Population description and clinical profile 

The incidence of CF is similar in most European populations, with a carrier 
frequency of 0.04 to 0.05. with the exception of Finland where it is considerably 
lower, al 0.013. CF d jag noil it- criteria included at leasl one positive swear test 
and pulmonary and/or pancreatic disease. In Finland and Denmark CF' is clinically 
very homogeneous and all but one CF family from each group have pancreatic 
insufficiency (PI); ihe English and Welsh CF tamilics arc also mostly PI. The 
French cohort arc mostly of Cehic origin, and reside, in north west Brittany: mey 
arc also predominantly Pi and were included in the. northern Europe: m data subset. 
The Italian CF families from Verona come from the Vencto region of north cast 
Italy and (hose from Urbino are mainly from central Italy, and show a higher 
proportion (0. 15 to 0.20) of pancreatic sufficient (PS) families than in northern 
Europe. The Spanish CF families arc from all over the country and it is not known 
whal proportion are PI: the Italian and Spanish families were designated to the 
southern European subset (29). No clinical data was available for the Bulgarian 
CF families but. they showed a similar genetic profile to other southern European 
griHips and were included in this subset. 

RKLftt linked to the CF locus 

The RFLPs included the known CF Banking markers. pXV-2c and pJ3.l I. and 
several markers between them. Each of the markers has been previously described: 
pXV-2c and pKM.I9 (7.8). pMP6d-9 (J0). pC2 and H80 (19) and pJ3.11 (5). 
pXV-2e. pKM. 19, pMF6d-9, pG2, H80 and pJ3.ll were used to detect TaqI, 
Psrl. Mspl, Xhal. PstI and Mspl RFLPs. respectively. Probe insert?; were 
radiolabeled by random oligonucleotide primed DMA synthesis (31). Family DN'A 
was prepared from peripheral blood by standard methods. Genomic DNA was 
digested with the appropriate cruymc using conditions recommended by the 
manufacturer. sLtc fractionated on agarose gels, capdlary transferred to nylon 
membranes, hybridised and auiuradiographed. 

Analysis of data 

Three methods of analysis were used. 

1 . Allelic association at each of the loci. Allele frequencies at each locus were 
determined using the gene emmting method. Standardised linkage disequilibrium, 
relative risk, chi squared heterogeneity and homogeneity, and odds ratio calculations 
were described previously (8. 9. 20). 

2. Three-v»uy analysis of loci. Haploty pes were assigned by hand from genotypings 
on nuclear CF families. A computer program, HA PLOT, was written for the 
purpose of calculating chi squared and G values for each of the possible positions 
of the CF locus with rcspcci toeaeh poirwlsc combination of markers. The log 
likelihood ratio. G, was used In preference to the chi squared when any observed 
or expected value wa> less than 5. G is defined by the relation: G = 2* (fobs* In 
(fobs/fexpl and its distribution approximates to that of the chi squared statistic 
i"32j. The algorithm of the analysis is explained with worked examples below. 
Since HA PLOT examines marker pairs within haplotypes it is possible to include 
some of the data from the incomplete haplorypcs and from hapJorypcs where phase 
could m>t be determined. 

The proposed major ancestral" CF haplotypc is 12 2 2 2 (pXV-2e. pKM. 19. 
pMP6d-9. pG2. HH0) and only the CF haplotypes which differ from it. ic. the 
recombinant haploiypcs. are analyzed. The data were analysed as a combined 
set and divided into the northern and *ouihern European daia sets separately. As 
an example, the markers p.\V-2c and pKM-19 arc analyzed with respect to the 
most likely position of the CF mutation. One examines the likelihood of ihe 
following locus orders: CF p.\V-2c pKM.19: pXV-2c CF pKM.19: and p.\V-2c 
pKM. 19 CF. It Ivis lv> be Home in mind thai the locus order pXv*-2c CF pKM. 19 
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can have recombinations on CF chmmosomcs in two possible positions: 
pXV-2c x CF pKM. 19 and pXV-2c CFX pKM. 19 1 X = posiiiun of cn>s>ovcr>. 
In order to maximise the power of the three-way analysis, the locus 1 xCF locus2 
und locus I L'Fxlocus2 analyses which lest the same locus order should, in sonic 
way, be combined. 

In ihe analysis of the combined daia set. the observed number of CF xp\V-2c 
recombinants is the number of CF chromosomes with pXV-2c allele 2. since 
the anccsiiar haploiypc has allele 1. Tliere ore 64 CF-nssocialcd 2 1 haploiypes 
(pXV-2c. pKM. 19) ajid 73 CF-associalcd 2 2 haplotypes. making the tniaJ number 
of recombinant chromosomes studied in this analysis 137. Since this type of 
recombination event would pince alleles originally found on non-CFchmmos<>n)es 
adjacent to the CF mutation, we would expect to find haplotypes 2 I and 2 2 
at a frequency similar to that found on non-CF chromosomes, assuming the locus 
order to he Ch pXV'-2c pKM.19. Hy counting the numbers of 2 2 and 2 I 
haplotypes on non-CF chromosomes we can calculate their frequencies and ohiain 
the expected values lor the recombination chromosomes by multiplying the 
frequency by the total number of observed recombinant chromosomes. In mis 
case the frequency ol the non-CF-associalcd haplolype 2 I is 0.75 (278/370) and 
that of haplotypc 2 2 is 0.25 (92/370). The expected 'number of CF chromosomes 
with haplntype 2 1 ts therefore 102.94 (0.75 x 137) and those will] haplotype 2 
2 is 34.06 "(0.25x137). To assess this difference statistically we used the chi 
squared lest. For this example we found the chi squared to be 55. (58 (p< 0.001). 
and therefore it is unlikelv that ihe CF mutation lies toward the centromere from 
p.\V-2c. 

Using the same markers, the next hypothesis lesied is that the CF mutation 
lies between the markers pXV-2c and pKM.19 and that the recombinants have 
occurred between CF and pXV-2c. In this case, the recombinant from the 
ancestral' fniplorype would be CF chromosomes containing the pXV-2c allele 

2. Only haplotypc 2 2 would be a single recombinant since haplotypc 2 J would 
imply a double recombination event from the major 'ancestral" haplotype.. Double 
recombinants with hapjocype 2 I would be expected lobe very rare. The expected 
number of chromosomes with hapkxypc 2 2 is calculated using the frequency 
or pXV-2c allele 2 on non-CF chromosomes and is 65.76 (0.4SX 137). The chi 
squared is 55.40 (p< 0.001) indicating that it is unlikely that CF lies between 
these markers. 

The next hypothesis tested is that the CP nuitation lies between the markers 
pXV-2c and pK.VI.19 and that the recombinants have occurred between CF and 
pKM. 19. In this case, the recombinant from the 'ancestral* haplotypc would he 
CF chromosomes containing the pKM.19 allele I. Only haplntype I I would 
be a single recombinant since haplocype 2 I would imply a double recombination 
event from the major 'ancestral ' haploiype. The expected number of chroirttsornes 
with haplotypc I I is calculated using the frequency of pKM. 19 allele I on non- 
CF chromosomes and is 95.04 (0.72 x 132). The chi squared is 64.67 (p < 0.001) 
confirming that it is unlikely that CF lies between Uiesc inaikers. 

The final hypothesis is that the CF locus lies telomeric to both pXV-2c and 
pKM. 19. Recombinant CF chromosomes will have pKM. 19 aUele 1 and therefore 
nnc considers the frequencies of non-CF-associated haplorypcs 1 J and 2 1 (pXV-2c 
and pKM. 19). "Hie exfteavd number of 1 I haplotypes is 66 (0.5 x 132) and that 
for haplotypc 2 1 is also 66 (0.5 x 132). The chi squared is 0.06 (p =0.806) 
indicating iliac die CF mutation is most likely to be telomeric to pKM.19 since 
it has been more significantly excluded from each of the other intervals. 

This approach does, hnwevei . have several dcficiences and should be regarded 
as an attempt to formalise the three-way analysis, a method upon which others 
will duuUtess improve. 

3. Haplotypc analysis by inspection. 
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